Please submit your responses as a nicely formatted notebook (.ipynb) file.
If your answer includes many decimal places (e.g., 3.14159265358979), please round it to a reasonable number of decimals for readability (typically 3 to 4).
For ease of marking, avoid showing the outputs of unnecessary values.
Make sure your code runs without errors and shows all required outputs.
Good coding style matters— clean, organized, and well-commented code will be rewarded. Disorganized, redundant, or poorly structured code may lose marks.
Requirements (IMPORTANT):
Unless otherwise specified, set any trimming you need to do at 20% and use an \(\alpha = 0.05\).
Unless the instructions explicitly state otherwise (e.g., “assume the data are normally distributed”), you are responsible for checking whether the assumptions of the method are reasonable and robust approaches need to be used.
If classical test assumptions are not violated, use the classical test.
Do not remove outliers unless explicitly asked to.
Respect the principle of marginality.
Plots, Packages, and Functions:
All plots must be made with ggplot2 and include clear, descriptive axis titles rather than default column names.
Only the following packages are permitted:
tidyverse, WRS2, car
Unless stated otherwise, The following functions are not permitted:
Many people mistakenly believe that behavioural change following punishment depends only on the severity of a consequence. In reality, the effectiveness of punishment procedures depends on more nuanced factors, including the degree of correlation between the response and the punisher (known as contingency) and the delay between the response and the punisher (known as contiguity).
The dataset phone_use_response_cost.csv examines the effects of contingency and delay on phone-use behaviour in students undergoing a classroom behaviour program. Each time a student used their phone during classroom hours, the student sometimes lost cellular and Wi-Fi access on their device for 1 hour (a negative punisher, because access to a valued resource was removed).
Independent Variables:
Punishment contingency was manipulated at four levels:
1.0 = High contingency (loss of connectivity always followed phone use)
0.75 = Medium contingency (loss of connectivity usually followed phone use)
0.25 = Low contingency (loss of connectivity sometimes followed phone use)
0 = No contingency (loss of connectivity was unrelated/unpredictable to phone use)
Importantly, all students experienced the same total number of one-hour connectivity losses across conditions; only the degree to which loss of connectivity was contingent on phone use varied.
Punishment delay was manipulated at four levels:
0 = Immediate (connectivity was removed immediately when the phone was used)
30s = Short delay (connectivity was removed 30 seconds after phone use)
60s = Moderate delay (connectivity was removed 60 seconds after phone use)
300s = Long delay (connectivity was removed 5 minutes after phone use)
Dependent Variable
The effectiveness of the program was measured using a suppression ratio, reflecting the extent to which students reduced their phone use during class relative to baseline. Lower suppression ratios indicate greater suppression of phone use (i.e., more effective punishment). Suppression ratios around 0.5 indicate no suppression.
Assume the data satisfy all the necessary assumptions for a classic 4 × 4 independent ANOVA.
Task
Create a professional-looking interaction plot for suppression ratio as a function of punishment contingency and punishment delay, including 95% confidence intervals.
Display a data frame showing the group means and corresponding confidence-interval boundaries on your plot.
Based on the plot, does the pattern of results suggest an interaction between punishment contingency and punishment delay?
library(tidyverse)# Load dataphone <-read_csv("data/phone_use_response_cost.csv")# Factor columns and organizephone$contingency <-factor(phone$contingency)phone$delay <-factor(phone$delay, levels =c("0s", "30s", "60s", "300s"))# Get statsplot_data <- phone |>group_by(contingency, delay) |>summarise(n =length(supp_rat),m =mean(supp_rat),se =sd(supp_rat) /sqrt(n),df = n -1,alpha =0.05,t_crit =abs(qt(alpha /2, df = df)),low_ci = m - t_crit * se,top_ci = m + t_crit * se )plot_data |>select(contingency, delay, m, low_ci, top_ci)
The four lines are not parallel, suggesting a possible interaction.
Note
Contingency could have been put on the x-axis instead.
Question 2
Using the phone_use_response_cost.csv data, fit a classic ANOVA model that predicts suppression ratio from both contingency and delay, including their interaction (i.e., a classic 4 × 4 factorial ANOVA).
The researchers were interested in a set of ordered, theory-driven contrasts for each independent variable, reflecting progressively finer distinctions in the effectiveness of their punishment method.
Contingency:
Maximal contingency hypothesis: Whether perfect contingency (1.00) produced greater suppression of phone use than the average of all lower contingency conditions (0.75, 0.25, 0).
Moderate contingency hypothesis: Whether high but imperfect contingency (0.75) produced greater suppression than the average of weak or absent contingency conditions (0.25, 0).
Minimal contingency hypothesis: Whether weak contingency (0.25) produced greater suppression than no contingency (0).
Delay:
Immediate punishment hypothesis: Whether immediate punishment (0 s) produced greater suppression of phone use than the average of all delayed punishment conditions (30, 60, 300 s).
Short vs long delay hypothesis: Whether a short delay (30 s) produced greater suppression than the average of longer delays (60 s, 300 s).
Long-delay differentiation hypothesis: Whether moderate (60 s) and long (300 s) delays differed in their effectiveness.
When creating your model (with interactions), set up custom contrasts that evaluate these hypotheses.
For each main effect and interaction, report the F-ratio, degrees of freedom, and p-value. For this question, all functions from allowed packages are permitted.
Contingency had a large, significant effect on suppression ratio. Delay produced a large, significant effect. Lastly, there was a small, yet significant interaction between contingency and delay.
Question 4
Are all the contrasts you created centered and orthogonal? Demonstrate your answer with R code, and display the table(s) of contrast values/weights used.
immediate short long
immediate 12 0 0
short 0 6 0
long 0 0 2
Question 5
Display the results table for your contrasts. Then, for each of the six hypotheses (main effects contrasts), state the conclusion you draw from these in plain English (i.e., without using numerical results).
# Contrast results (rounding for readability)results <-round(summary(model)$coefficients, 3)results
Perfect contingency suppresses phone use more than all weaker or absent contingencies combined.
Moderate contingency hypothesis:
High but imperfect contingency suppresses phone use more than weak or absent contingency.
Minimal contingency hypothesis:
Even a weak contingency suppresses phone use more than no contingency at all.
Immediate punishment hypothesis:
Immediate punishment suppresses phone use more than punishment delivered after any delay.
Short vs long delay hypothesis:
Punishment delivered after a short delay suppresses phone use more than punishment delivered after longer delays.
Long-delay differentiation hypothesis:
Punishment delivered after a moderate delay suppresses phone use more than punishment delivered after a very long delay.
Note
For full marks you needed to indicate the direction of the change, not just whether there was a change in suppression.
Question 6
Considering the potential interactions, does a perfect contingency provide an advantage over weaker contingencies when the delay increases from moderate to long?
Report/extract the following:
Test-statistic
Degrees of freedom
P-value
Partial \(\eta^2\)
# Note: this specific test is found in row 14 of the coefficient tablet_stat <- results[14, 3]df <- effects$Df[4]# Eta^2eta_sq <- t_stat^2/ (t_stat^2+ df)# Resultspaste0("Test Stat = ", round(t_stat, 5))paste0("DF = ", df)paste0("p = ", results[14 ,4])paste0("Partial eta^2 = ", round(eta_sq, 5))
The difference between perfect contingency and weaker contingencies is not the same at 60 seconds as it is at 300 seconds. Specifically, moving from a moderate to long delay reduces the punishing effect at weaker (non-perfect) contingencies.
Question 7
A blogger named Josh Madison tabulated the amount of each M&M colour in a variety of M&M packages. The data can be found in MM_Madison.csv. Calculate a p-value to determine whether the data provide evidence that the colours are equally represented overall, or are some colours systematically more common than others across packages? Assume that the amount of each colour in a package is independent of the other colours.
library(tidyverse)# Load data and convert to wide formatmm <-read_csv("data/MM_Madison.csv") |>pivot_wider(names_from = colour, values_from = amount) |>select(-c(pkg, weight_oz, year))# Calculate sums (observed frequencies)obs <-colSums(mm)# Expected frequenciesN <-sum(obs)exp <-rep(N /6, 6)# Chi-Sq test statchi_sq <-sum(((obs - exp)^2) / exp)# Degrees of freedomdf <-6-1# P-valuechi_sq
[1] 65.7374
pchisq(chi_sq, df = df, lower.tail =FALSE)
[1] 7.879637e-13
Question 8
Assuming you calculated the previous question correctly, the test you conducted has a power of 0.98 to detect a small effect.
What does this imply about how the result of the hypothesis test should be interpreted? (You do not need to perform any additional calculations.)
The high amount of power means that, if there is a true difference in colours, we can be very confident that it is being detected. However, power above 0.9 is often more than necessary and allows even trivial departures from the null hypothesis to be detected (i.e., it may be a true difference, but not one large enough to care about). Consequently, p-values should be interpreted alongside effect sizes to guage the practical importance of the difference.
Other points you could potentially make
Waste of resources.
Ethically questionable (though not in this case).
Test ceases to be a test (it’s a guarantee).
Question 9
Holiday traditions often come with strong opinions (especially about candy). Some people swear that having a little holiday candy nearby makes stressful moments feel more manageable, while others think it’s just sugar and wishful thinking.
A local psychologist wanted to test whether having holiday candy available during a stressful task affects how anxious people feel.
To test this, participants were asked to complete a timed puzzle task in a softly lit room while a loop of chaotic “holiday mall music” played in the background (to create a mildly stressful, seasonally-themed atmosphere).
In one condition, a small plate of holiday sweets was beside them and they were told they could look at it but not eat it during the task. In the other condition, no sweets were present.
During each session, the psychologist measured mean heart rate (in beats per minute) using a chest-strap monitor. Higher heart rate is presumed to indicate greater physiological arousal/anxiety.
The data file holiday_candy_hr.csv contains the following columns:
id — Unique random ID for each participant.
condition — “Candy Present” or “No Candy”.
hr_bpm — Heart rate (beats per minute) recorded during the task.
The researcher hypothesizes that the “No Candy” condition will produce higher levels of anxiety (i.e., higher heart rate) than the “Candy Present” condition. Conduct an appropriate statistical test to evaluate whether the data provide sufficient evidence to support the researcher’s hypothesis.
Report all of the following:
Null and alternative hypothesis
Test statistic
Degrees of freedom
p-value
95% confidence interval
# Load data (note data file's rows are out of order)hr <-read_csv("data/holiday_candy_hr.csv") |>arrange(id)head(hr) # look at first 6 rows (observations are PAIRED)
# A tibble: 6 × 3
id condition hr_bpm
<dbl> <chr> <dbl>
1 1 No Candy 84.53742
2 1 Candy 73.57114
3 2 Candy 88.03545
4 2 No Candy 70.87695
5 3 No Candy 93.24787
6 3 Candy 62.31203
Create an appropriate bar plot to visualize the results from the previous question. Include two-sided 95% confidence intervals for each bar. Additionally, display a data frame containing the summary statistics shown in the plot.
Note
The data is paired and non-normal. The plot needs to account for those facts.
# Add difference scores to datadiff_df <-data.frame(id =1:N,condition ="Difference",hr_bpm = diff_hr)hr <-rbind(hr, diff_df)# Factoring to adjust condition orderinghr$condition <-factor(hr$condition, levels =c("No Candy", "Candy", "Difference"))# Get stats for plotplot_data <- hr |>group_by(condition) |>summarise(n =length(hr_bpm),G =0.2,m =mean(hr_bpm, tr = G),s =sqrt(winvar(hr_bpm, tr = G)),se = s / ((1-2* G) *sqrt(n)),h = n -2*floor(G * n),df = h -1,alpha =0.05,t_crit =abs(qt(alpha /2, df = df)),low_ci = m - t_crit * se,top_ci = m + t_crit * se )# Data frameplot_data |>select(condition, m, low_ci, top_ci)
# A tibble: 3 × 4
condition m low_ci top_ci
<fct> <dbl> <dbl> <dbl>
1 No Candy 79.87594 77.85341 81.89848
2 Candy 76.00632 74.32401 77.68863
3 Difference 3.261754 0.7917384 5.731770
Over the years, several Alpine villages have reported a chilling pattern of wintertime disturbances during the Advent season. Local folklore attributes these events to Krampus, a legendary horned figure said to justly punish the wicked in the weeks leading up to Christmas. According to tradition, Krampus is most active during cold, dark nights, particularly when supernatural conditions are said to be strongest.
A group of cultural anthropologists and behavioural scientists has compiled a historical dataset of reported Krampus-related incidents from village records spanning 1890–2024. The dataset, krampus_incidents.csv, contains the following variables:
incident_severity: a continuous measure of severity of the incident developed by the researchers. It is on a scale of 0–100, where 100 indicates extreme harm or disruption.
snow_density: a continuous measure of snowfall intensity on the night of the incident (in % opacity, 0–100).
guard_exp_lvl: experience level of any village night guard present:
“None” (no guard present),
“Novice” (inexperienced guard),
“Experienced” (seasoned guard).
krampusnacht: categorical variable (Yes = 1, No = 0) indicating whether the incident occurred on Krampusnacht (December 5th), traditionally believed to be the peak of Krampus activity.
The researchers hypothesize that harsh environmental conditions (e.g., heavy snowfall) may increase the severity of incidents, particularly on Krampusnacht. They are also interested in whether the presence and experience level of village guards moderates these effects.
Your Task
Plot incident severity as a function of snow density, separately for incidents occurring on Krampusnacht and not on Krampusnacht.
Display separate OLS regression lines (incident_severity ~ snow_density) for Krampusnacht and non-Krampusnacht nights.
Format the plot to have a professional appearance suitable for inclusion in an academic article (because this is definitely real data you are analyzing).
If necessary, adjust the plot size to ensure optimal display within Google Colab.
There are different approaches you could take for this.
Option 1: Plot facets - space is not used optimally with this method, but it provides the clearest contrast of the two categories and makes it easy to visualize potential trends.
Option 2: Distinguish the categories via aesthetics (in particular the shape aesthetic so that the plot is robust to colour blindness). Care needs to be taken to make sure “Yes” values are not hidden by overplotting. This is tricky to do effectively and elegantly.
Using the krampus_incidents.csv data, create the following ordinary least-squares regression models using lm():
Model 1:
Predict incident severity as a function of snow density.
Model 2:
Extend Model 1 by adding the guard experiance level.
Model 3:
Extend Model 2 by adding Krampusnacht.
Use the following coding scheme for the categorical variables:
Guard Experiance:
Level
\(X_2\)
\(X_3\)
None
0
0
Novice
1
0
Experienced
0
1
Krampusnacht:
Level
\(X_4\)
No
0
Yes
1
Report each model’s equation, \(R^2\) value, and interpret the coefficients of the newely added predictor in plain English.
# Factor categories# Putting in custom order for conveniencekrampus$guard_exp_lvl <-factor(krampus$guard_exp_lvl, levels =c("None", "Novice", "Experienced"))# Set dummy coding# The ifelse() function could also be used for this.Novice <-c(0, 1, 0)Experienced <-c(0, 0, 1)contrasts(krampus$guard_exp_lvl) <-cbind(Novice, Experienced)Yes <-c(0, 1)contrasts(krampus$Dec_5) <-cbind(Yes)# Modelsmod1 <-lm(incident_severity ~ snow_density, data = krampus)mod2 <-lm(incident_severity ~ snow_density + guard_exp_lvl, data = krampus)mod3 <-lm( incident_severity ~ snow_density + guard_exp_lvl + Dec_5,data = krampus)mod1sum <-summary(mod1)mod2sum <-summary(mod2)mod3sum <-summary(mod3)
Model 1
\(\hat{y} = 16.369 + 0.563 x\)
\(R^2 = 0.381\)
For every one percentage point increase in snow density, the incident severity increases by 0.563 points on average.
Model 3 accounts for a statistically significant additional proportion of the variance (17%) and is therefore the preferred model.
Question 14
Refit model 3 using ordinary least squares (lm()), after identifying and removing potential outliers with a robust outlier-detection method. List all observations that were removed, and report the fitted regression equation and its \(R^2\) value.
# Note: Outliers should only be removed based on continuous independent/predictor variables# (i.e., snow density).# MADN statmadn <-median(abs(krampus$snow_density -median(krampus$snow_density))) /0.6745# Detect outlierskrampus$outliers <- (abs(krampus$snow_density -median(krampus$snow_density)) / madn) >2.24# Outlier listkrampus |>filter(outliers ==TRUE)
Participants listened to an audio recording of an argument between two coworkers and were then asked: “How intense was the disagreement between the coworkers when they _______ with each other?”
The blank was filled with one of four verbs: “argued,” “disagreed”, “debated,” or “discussed.”
Participants responded by selecting one of three options: “very intense,” “moderately intense,” or “not very intense.”
The results are summarized in the following table:
Very intense
Moderately intense
Not very intense
Argued
74
36
30
Disagreed
58
60
42
Debated
45
68
47
Discussed
26
44
80
Does the wording of the question (i.e., the verb used) significantly influence participants’ judgments of intensity?
Conduct an appropriate statistical test and report your conclusion, including the test statistic, degrees of freedom, and p-value.
data <-data.frame(very =c(74, 58, 45, 26),moderate =c(36, 60, 68, 44),not_very =c(30, 42, 47, 80) )rownames(data) <-c("argued", "disagreed", "debated", "discussed")# Get Row, Column Totals, and Overall TotalRowTotal <-apply(data, MARGIN =1, FUN = sum)ColTotal <-apply(data, MARGIN =2, FUN = sum)#Get Total number of ChildrenN <-sum(RowTotal)# Get expected Frequenciesexpected <- RowTotal %*%t(ColTotal) / N#Calculate the test-statisticchiSq <-sum(((data - expected)^2)/ expected)#Degrees of Freedomdf <- (4-1) * (3-1)#p-valuep <-pchisq(chiSq, df = df, lower.tail =FALSE)# Resultspaste0("Test Stat = ", round(chiSq, 5))paste0("DF = ", df)paste0("p = ", round(p, 5))
Yes, the manner in which the question is worded does appear to impact their response (i.e., the question wording is not independent of the response they make).